Data driven subword unit modeling for speech recognition and its application to interactive reading tutors
نویسندگان
چکیده
This paper proposes a novel token-passing search architecture for supporting subword unit based speech recognition and a corresponding algorithm based on the well-known LZW text compression method to determine a vocabulary of subword units in an unsupervised manner. We compare our subword unit selection algorithm to an existing approach based on Minimum Description Length (MDL) modeling and also syllable representations for English. Our approach is shown to offer units which share properties similar to syllables, but are determined in a language-independent and data-driven manner. Using our novel token passing architecture which combines both word-level and subword unit representations, we applied the proposed framework to the problem of oral reading tracking within an interactive literacy tutor for children. The proposed architecture is shown to provide advantages over whole-word based speech recognition for the problem of recognizing and detecting oral reading events.
منابع مشابه
Highly accurate children's speech recognition for interactive reading tutors using subword units
Speech technology offers great promise in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, assessing comprehension of the text being read by estimating if the prosodic structure of the speech is appropriate to the discourse structure of the story, or by en...
متن کاملImproved Subword Modeling for WFST-Based Speech Recognition
Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexic...
متن کاملData-driven Pronunciation Modeling for AS
We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per ...
متن کاملSpeech Recognition Using Demi-Syllable Neural Prediction Model
The Neural Prediction Model is the speech recognition model based on pattern prediction by multilayer perceptrons. Its effectiveness was confirmed by the speaker-independent digit recognition experiments. This paper presents an improvement in the model and its application to large vocabulary speech recognition, based on subword units. The improvement involves an introduction of "backward predic...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کامل